Method and apparatus for establishing network performance model

ABSTRACT

A method and apparatus for establishing a network performance model. The method includes: determining, according to performance data provided by network nodes and the probability of the performance data, a parameter α showing the correlation of the performance data of different network nodes in a whole network and a parameter β showing the distribution pattern of the performance data in the network; and establishing a Latent Dirichlet Allocation, LDA, network performance model by using the determined parameter α and the parameter β.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of Chinese Application No. 200710151586.1, filed Sep. 28, 2007. The disclosure of the above application is incorporated herein by reference.

FIELD

The present disclosure relates to computer network technologies, and particularly to a method and apparatus for establishing a network performance model.

BACKGROUND

The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.

As the network technologies develop rapidly, the number of users and new services keeps on expanding, and network operators have to try to provide the best services for users to survive in the market full of intense competition. Therefore, network performance has become the focus in such a circumstance. In practical applications, an operator usually needs to run network environment simulation to evaluate the network performance for network planning, optimization and Quality of Service (QoS) control. According to the operation principles of actual networks, network performance model can be established in a network environment simulation, and an actual network environment can be simulated by using the established network performance model.

The network performance model in the existing technology is the Gaussian mixture model. The basic process of establishing a Gaussian mixture model includes firstly a step of providing performance data by a part of network nodes in a network. The performance data is described with plurality of components which affect the performance data. Each of the components is in compliance with Gaussian distribution. Therefore the components are generally called Gaussian components. The performance data in the established Gaussian mixture model equals the total weights of all Gaussian components. Suppose a piece of performance data is described with N Gaussian components, the Gaussian distribution mean value of the No. j Gaussian component is μ_(j), the deviation of the Gaussian component is σ_(j) ² and the mixture weight value of the Gaussian component is ω_(j), the probability density function of the performance data is

${p\left( {s❘\theta} \right)} = {\sum\limits_{j = 1}^{N}\;{\omega_{j}{{N_{s}\left( {\mu_{j},\sigma_{j}^{2}} \right)}.}}}$ Wherein θ=(ω_(j),μ_(j),σ_(j) ²) and s is the performance data. The probability density function shows the probability density of the performance data when the Gaussian components of the performance data are determined. A matrix can be obtained with performance data on rows and Gaussian components on columns. A value in the matrix is the probability density corresponding to the performance data of the corresponding row and the Gaussian component of the corresponding column. Therefore the matrix shows the distribution of the Gaussian components of all measured performance data. A simulation environment can be established with the matrix as the parameters of the network performance model.

However, when the Gaussian mixture model is used as the network performance model, the Gaussian component weights are derived solely from sample performance data provided by the network nodes, i.e., the network performance model is established for the network nodes that provided the performance data and is reliable only in showing the performance of the network nodes. The performance data of other network nodes in the network are not shown in the network performance model, which means the network performance model established with the conventional method is not suitable to the whole network and is not reliable in showing the performance of the whole network. In one sentence, the network performance model does not fit the whole network consisting of network nodes of same aggregation features, and nodes providing the performance data can be chosen at random from the aggregation space.

SUMMARY

The present disclosure provides a method and apparatus for establishing a network performance model which is reliable in showing the performance of a whole network. The method for establishing a network performance model includes:

determining, according to performance data provided by network nodes and the probability of the performance data, a parameter α showing the correlation of the performance data of different network nodes in a whole network and a parameter β showing the distribution pattern of the performance data in the network; and

establishing a Latent Dirichlet Allocation, LDA, network performance model by using the determined parameter α and the parameter β.

The apparatus for establishing a network performance model includes:

a parameter determining unit, adapted to determine, according to performance data provided by network nodes and the probability of the performance data, a parameter α showing the correlation of the performance data of different network nodes in a whole network and a parameter β showing the distribution pattern of the performance data in the network; and a model establishing unit, adapted to establish a Latent Dirichlet Allocation, LDA, network performance model by using the parameter α and parameter β determined by the parameter determining unit.

It can be seen from the technical scheme that the method and apparatus provided by embodiments of the present disclosure can determine a parameter β showing the distribution pattern of the performance data in the whole network according to the performance data provided by network nodes and can further determine a parameter α showing the correlation of the performance data of different network nodes in the whole network. The combination of the determined parameters α and β shows the distribution pattern of the performance data of different network nodes in the network. The distribution pattern is key factor used for establishing the network performance model. The network performance model established by using the present disclosure is not only reliable in showing the performance of the network nodes that provide the performance data, but also fits other network nodes in the network and is thus reliable in showing the performance of the whole network.

Further areas of applicability will become apparent from the description provided herein. It should be understood that the description and specific examples are intended for purposes of illustration only and are not intended to limit the scope of the present disclosure.

DRAWINGS

The drawings described herein are for illustration purposes only and are not intended to limit the scope of the present disclosure in any way.

FIG. 1 is a simplified flow chart of a method for establishing a network model according to an embodiment;

FIG. 2 is a simplified flow chart of a method for estimating parameters α and β by using the maximum likelihood approach according to an embodiment;

FIG. 3 a is an internal structure scheme of an LDA model according to an embodiment before an intermediate variable is introduced;

FIG. 3 b is an internal structure scheme of an LDA model according to an embodiment after an intermediate variable is introduced;

FIG. 3 c is an internal structure scheme of an LDA model according to an embodiment with Gaussian distribution introduced;

FIG. 4 is a structure scheme of a system for establishing a network performance model according to an embodiment; and

FIG. 5 is a structure scheme of an apparatus for establishing a network performance model in accordance with an embodiment.

DETAILED DESCRIPTION

The following description is merely exemplary in nature and is not intended to limit the present disclosure, application, or uses.

Reference throughout this specification to “one embodiment,” “an embodiment,” “specific embodiment,” or the like in the singular or plural means that one or more particular features, structures, or characteristics described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment,” “specific embodiment,” or the like in the singular or plural in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

In order to make the objective, technical scheme and merits more apparent, a detailed description is hereinafter given with reference to specific embodiments and accompanying drawings.

A method in embodiments mainly includes: determining, according to performance data provided by network nodes and the probability of the performance data, a parameter α showing the correlation of the performance data of different network nodes in a whole network and a parameter β showing the distribution pattern of the performance data in the network; and establishing a Latent Dirichlet Allocation (LDA) network performance model by using the determined parameter α and parameter β.

The method may further include: generating simulated performance data for network nodes in the network by using the established LDA network performance model and eventually establishing a network performance simulation circumstance.

FIG. 1 is a simplified flow chart of a method for establishing a network model according to an embodiment. As shown in FIG. 1, the method mainly includes the following processes.

Block 101: A parameter α showing the correlation of performance data of different network nodes in a whole network and a parameter β showing the distribution pattern of the performance data in the network are determined by using the performance data provided by the network nodes.

In this process, the performance data collected by the network nodes may be bandwidth, delay or other performance data.

Besides, the collected performance data can be further processed to reduce the sample space and the processed performance data shall be used as sample data. The processing may include: reducing the accuracy of the performance data by rounding the performance data when the accuracy of the collected performance data is too high, e.g., rounding 5.21 to 5. When the scope of the collected performance data is too broad, the collected performance data is divided into different blocks, e.g., using 1 to indicate [0,10) and using 2 to indicate [10,20).

In this process, the correlation of the performance data of the network nodes in the network is obtained from the performance data provided by a part of the network nodes, and the distribution pattern of the performance data is further obtained accordingly. Therefore the distribution pattern of the performance data of all network nodes in the network may be obtained.

In this process, the parameters α and β are determined when the performance data occurs with the maximum probability. The determination can be made with the maximum likelihood approach or other approaches.

The flow chart shown in FIG. 2 can be used for the determination. In this embodiment, the maximum likelihood approach is used for estimating parameters α and β. As shown in FIG. 2, the process of estimating the parameters α and β is described as follows.

Block 201: Component amount K of a network model is initiated and parameters α and β are configured.

In this process, K indicates that the correlation of the network nodes in the network is determined by K factors and K≧2. The value of K is usually determined according to experience. α is the parameter of K dimension and β is the parameter of K×V dimension. Wherein, V indicates the sample space of the received performance data of the network nodes. Normally the initial value of the parameter α is 1 and the initial value of the parameter β is 0.

Block 202: Likelihood function containing the parameters α and β is established.

The likelihood function l(α,β) established in this process can be:

$\begin{matrix} {{l\left( {\alpha,\beta} \right)} = {\sum\limits_{d = 1}^{M}\;{\log\;{{p\left( {{w_{d}❘\alpha},\beta} \right)}.}}}} & (1) \end{matrix}$

M is the number of network nodes chosen to provide performance data and w_(d) is the performance data sent from No. d network node. The likelihood function l(α,β) also contains latent internal variables θ and Z_(d). θ indicates the distribution pattern of the performance data of different network nodes and complies with Dirichlet distribution Dir(α). Both θ and α are parameters of K dimension.

$\begin{matrix} {{{Dir}(\alpha)} = {{\frac{1}{B(\alpha)}{\sum\limits_{i = 1}^{K}\; x_{i}^{\alpha_{i} - 1}}} = {\frac{\Gamma\left( {\sum\limits_{i = 1}^{K}\;\alpha_{i}} \right)}{\prod\limits_{i = 1}^{K}\;{\Gamma\left( \alpha_{i} \right)}}{\prod\limits_{i = 1}^{K}\; x_{i}^{\alpha_{i} - 1}}}}} & (2) \end{matrix}$

Z_(d) indicates the distribution pattern of the performance data of the No. d network node and complies with multinomial distribution Multinomial(θ), and the probability of w_(d) with parameters Z_(d) and β is p(w_(d)|Z_(d),β).

Block 203: The parameters α and β that make the likelihood function to reach the maximum value are calculated according to the received performance data.

Because the likelihood function l(α,β) established in Block 202 shows the probability of w_(d) when the correlation of the performance data of network nodes equals parameters, and the distribution pattern of the performance data of the network nodes equals parameters. Therefore, the parameters α and β that allow the likelihood function l(α,β) to reach the maximum value equal the parameters α and β when the w_(d) reach the maximum value.

Because the calculation for p(w_(d)|α,β) in Equation (1) is complicated, variational inference can be employed to simplify the calculation of parameters α and β. The simplified calculation is given as below:

$\begin{matrix} \begin{matrix} {{\log\;{p\left( {{w_{d}❘\alpha},\beta} \right)}} = {\log{\int{\sum\limits_{Z_{d}}\;{{p\left( {\theta,Z_{d},{w_{d}❘\alpha},\beta} \right)}{\mathbb{d}\theta}}}}}} \\ {= {\log\;{\int{\sum\limits_{z_{d}}\;{{{q\left( {\theta,{Z_{d}❘\gamma},\varphi} \right)} \cdot \frac{p\left( {\theta,Z_{d},{w_{d}❘\alpha},\beta} \right)}{\;{q\left( {\theta,{Z_{d}❘\gamma},\varphi} \right)}}}{\mathbb{d}\theta}}}}}} \\ {\geq {\int{\sum\limits_{z_{d}}\;{{q\left( {\theta,{Z_{d}❘\gamma},\varphi} \right)}\log\frac{p\left( {\theta,Z_{d},{w_{d}❘\alpha},\beta} \right)}{\;{q\left( {\theta,{Z_{d}❘\gamma},\varphi} \right)}}{{\mathbb{d}\theta}.}}}}} \end{matrix} & (3) \end{matrix}$

In the calculation, γ and φ are intermediate variables introduced into the variational inference. γ is a parameter of K dimension and φ is a parameter of K×V dimension. Similarly, q is also an introduced intermediate function and E_(q) indicates the expected value of the function q. FIG. 3 a shows an internal structure of an LDA model before introducing intermediate variables and FIG. 3 b shows an internal structure of an LDA model after introducing intermediate variables.

In Equation (3), the difference between the left and the right of the sign of inequality is divergence K-L:

$\begin{matrix} {D\left( {{{q\left( {\theta,{Z❘\gamma},\varphi} \right)}\left. {p\left( {\theta,{z❘w},\alpha,\beta} \right)} \right)} = {{{\log\;{p\left( {{w_{d}❘\alpha},\beta} \right)}} - {\int{\sum\limits_{z}\;{{q\left( {\theta,{Z_{d}❘\gamma},\varphi} \right)}\log\frac{p\left( {\theta,Z_{d},{w_{d}❘\alpha},\beta} \right)}{\;{q\left( {\theta,{Z_{d}❘\gamma},\varphi} \right)}}{\mathbb{d}\theta}}}}} = {{\log\;{p\left( {{w_{d}❘\alpha},\beta} \right)}} - \left( {{E_{q}\left\lbrack {\log\;{p\left( {\theta,Z_{d},{w_{d}❘\alpha},\beta} \right)}} \right\rbrack} - {E_{q}\left\lbrack {\log\;{q\left( {\theta,{Z_{d}❘\gamma},\varphi} \right)}} \right\rbrack}} \right.}}} \right.} & (4) \end{matrix}$

It can be learnt from Equation (4) that: log p(w _(d)|α,β)=E _(q)[log p(θ,Z _(d) |w _(d)|αβ)]E _(q)[log q(θ,Z _(d)|γ,φ)]+D(q(θ,Z _(d)|γ,φ)∥p _((θ,) Z _(d) |w _(d),α,β)).

It can be learnt from the property of the divergence K-L that D(q(θ,Z_(d)|γ,φ)∥p(θ,Z_(d)|w_(d),α,β))≧0. Therefore when E_(q)[log p(θ,Z_(d),w_(d)|α,β)]−E_(q)[log q(θ,Z_(d)|γ,φ)] reaches the maximum value, log p(w_(d)|α,β) also reaches the maximum value.

$\begin{matrix} {{L\left( {\gamma,\varphi,\alpha,\beta} \right)} = {{{E_{q}\left\lbrack {\log\;{p\left( {\theta,Z_{d},{w_{d}❘\alpha},\beta} \right)}} \right\rbrack} - {E_{q}\left\lbrack {\log\;{q\left( {\theta,{Z_{d}❘\gamma},\varphi} \right)}} \right\rbrack}} = {{{E_{q}\left\lbrack {\log\;{p\left( {\theta ❘\alpha} \right)}} \right\rbrack} + {E_{q}\left\lbrack {\log\;{p\left( {Z_{d}❘\theta} \right)}} \right\rbrack} + {E_{q}\left\lbrack {\log\;{p\left( {{w_{d}❘Z_{d}},\beta} \right)}} \right\rbrack} - {E_{q}\left\lbrack {\log\;{q\left( {\theta ❘\gamma} \right)}} \right\rbrack} - {E_{q}\left\lbrack {\log\;{q\left( {Z_{d}❘\varphi} \right)}} \right\rbrack}} = {{\log\;{\Gamma\left( {\sum\limits_{j = 1}^{k}\;\alpha_{j}} \right)}} - {\sum\limits_{i = 1}^{k}\;{\log\;{\Gamma\left( \alpha_{i} \right)}}} + {\sum\limits_{i = 1}^{k}\;{\left( {\alpha_{i} - 1} \right)\left( {{\Psi\left( \gamma_{i} \right)} - {\Psi\left( {\sum\limits_{j = 1}^{k}\;\gamma_{j}} \right)}} \right)}} + {\sum\limits_{n = 1}^{N}\;{\sum\limits_{i = 1}^{k}{\varphi_{ni}\left( {{\Psi\left( \gamma_{i} \right)} - {\Psi\left( {\sum\limits_{j = 1}^{k}\;\gamma_{j}} \right)}} \right)}}} + {\sum\limits_{n = 1}^{N}\;{\sum\limits_{i = 1}^{k}{\varphi_{ni}w_{n}^{j}\log\;\beta_{ij}}}} - {\log\;{\Gamma\left( {\sum\limits_{j = 1}^{k}\;\gamma_{j}} \right)}} + {\sum\limits_{i = 1}^{k}\;{\log\;{\Gamma\left( \gamma_{i} \right)}}} - {\sum\limits_{i = 1}^{k}\;{\left( {\gamma_{i} - 1} \right)\left( {{\Psi\left( \gamma_{i} \right)} - {\Psi\left( {\sum\limits_{j = 1}^{k}\;\gamma_{j}} \right)}} \right)}} - {\sum\limits_{n = 1}^{N}\;{\sum\limits_{i = 1}^{k}{\varphi_{ni}\log\;\varphi_{ni}}}}}}}} & (5) \end{matrix}$

The maximum value of Formula (5) is the extremum of Formula (5) with γ and φ as independent variables. When the values of parameters α and β are known variables, the optimized values of parameters γ and φ are:

$\begin{matrix} {{\varphi_{ni} \propto {\beta_{iv}{\exp\left( {{\Psi\left( \gamma_{i} \right)} - {\Psi\left( {\sum\limits_{j = 1}^{k}\;\gamma_{j}} \right)}} \right)}\mspace{14mu}\gamma_{i}}} = {\alpha_{i} + {\sum\limits_{n = 1}^{N}\;{\varphi_{ni}.}}}} & (6) \end{matrix}$

The values of parameters γ and φ, i.e., the values of all γ_(i) and φ_(ni) Formulation (6), are calculated through iteration with initial values of parameters α and β.

The obtained values of parameters γ and φ are used in the likelihood function of Equation (1):

${l\left( {\alpha,\beta} \right)} = {\sum\limits_{d = 1}^{M}\;{\log\;{{p\left( {{w_{d}❘\alpha},\beta} \right)}.}}}$ log p(w_(d)|α,β) is replaced with L(γ,φ,α,β) of Equation (5). Taking the parameters α and β as independent variables, the extremum of L(γ,φ,α,β) is calculated and therefore:

$\begin{matrix} {\mspace{79mu}{{\beta_{ij}\infty{\sum\limits_{d = 1}^{M}\;{\sum\limits_{n = 1}^{N_{d}}\;{\varphi_{{dn}_{i}}^{*}w_{dn}^{j}}}}};}} & (7) \\ {{{L(\alpha)} = {\sum\limits_{d = 1}^{M}\;\left( {{\log\;\Gamma\left( {\sum\limits_{j = 1}^{K}\;\alpha_{j}} \right)} - {\sum\limits_{i = 1}^{K}\;{\log\;\Gamma\left( \alpha_{i} \right)}} + {\sum\limits_{i = 1}^{K}\;\left( {\left( {\alpha_{i} - 1} \right)\left( {{\Psi\left( \gamma_{d_{i}} \right)} - {\Psi\left( {\sum\limits_{j = 1}^{K}\;\gamma_{d_{j}}} \right)}} \right)} \right)}} \right)}};} & (8) \end{matrix}$

The value of the parameter β is calculated by calculating Formulation (7) with the obtained parameters γ and φ as variables. The value of parameter α can be obtained by calculating the extremum of Formulation (8) with parameter α as the variable. The extremum of Formulation (8) can be calculated with the Newton-Raphson method.

Preferably, Blocks 201 to 203 are repeated with the obtained parameters α and β as initial variables to calculate the values of parameters γ and φ again and further to calculate the values of parameters α and β again. Then Blocks 201 to 203 are repeated for the second time with the values of parameters α and β obtained in the preceding repeated processes as the initial variables. The Blocks are repeated again and again until the values of parameters α and β show convergence, the convergent values of parameters α and β shall be taken as the final values of parameters α and β.

Block 204: The obtained values of parameters α and β are saved.

The saved parameters α and β are taken as the initial parameters α and β in the iteration calculation for establishing a network performance model next time.

Block 102: An LDA network performance model is established by using the obtained parameters α and β.

In this process, the LDA network performance model is established by using the parameters α and β to calculate the internal variables of the LDA model. The internal variable θ complies with Dir(α) distribution, i.e.,

${p\left( {\theta ❘\alpha} \right)} = {\frac{\Gamma\left( {\sum\limits_{i - 1}^{K}\;\alpha_{i}} \right)}{\prod\limits_{i = 1}^{K}\;{\Gamma\left( \alpha_{i} \right)}}\theta_{1}^{\alpha_{1} - 1}\theta_{2}^{\alpha_{2} - 1}\ldots\mspace{14mu}{\theta_{K}^{\alpha_{K} - 1}.}}$ For the No. d node, the Z_(d) shall comply with Multinomial(θ) distribution and dεV.

When the network performance model is established, the following processes can be performed with the established network performance model.

Block 103: Performance data is generated with the established LDA network performance model.

Because the parameter α in the established LDA model shows the correlation of the performance data of the network nodes in the network and the parameter β shows the distribution pattern of the performance data, the LDA model including the combination of the parameters α and β can show the performance of the whole network.

In this process, suppose the simulated performance data {w₁, w₂, . . . , w_(n), . . . , w_(N)} of a network node need to be generated with the established LDA network performance model, the generation of the simulated performance data includes: selecting N as the amount number of the simulated performance data generated by the No. d network node, wherein N complies with Poisson distribution; making, by using the determined parameter α, θ to comply with Dir(α) distribution, i.e.,

${{p\left( {\theta ❘\alpha} \right)} = {\frac{\Gamma\left( {\sum\limits_{i - 1}^{K}\;\alpha_{i}} \right)}{\prod\limits_{i = 1}^{K}\;{\Gamma\left( \alpha_{i} \right)}}\theta_{1}^{\alpha_{1} - 1}\theta_{2}^{\alpha_{2} - 1}\ldots\mspace{14mu}\theta_{K}^{\alpha_{K} - 1}}};$ making Z_(dn), which corresponds to w_(dn), to comply with Multinomial(θ) distribution; making w_(dn) to satisfy p(w_(dn)|Z_(dn),β); and repeating the above processes so that the simulated performance data of multiple network nodes can be generated by the LDA network performance model. In these processes, w_(d), is the No. n simulated performance data of the No. d network node and Z_(dn) indicates the distribution pattern of the No. n simulated performance data of the No. d network node.

The established LDA model shows the performances of all network nodes in the whole network. Therefore the simulated performance data generated in Block 103 can be the simulated performance data of the network nodes that provide performance data in Block 101, or be simulated performance data provided by other network nodes in the network.

A network performance simulation environment can be established with the simulated performance data of network nodes generated in Block 103.

The simulated performance data generated in Block 103 for the network nodes are assigned to the network nodes in the simulation environment. When the simulated performance data includes the delays and bandwidths of the network nodes, the simulated performance data shall be assigned to the network nodes in the simulation environment as the delays and bandwidths of the network nodes to establish a simulation environment which has the same distribution pattern as the real network. For example, when a simulation environment embodying the delays of the network nodes is needed, the parameter α that shows the correlation of the performance data of the network nodes in the network and the parameter β that shows the distribution pattern of performance data are determined according to the delay data provided by a part of the network nodes in the network and the probability of the delay data. An LDA model is established with the determined parameters α and β. Simulated delay data of all network nodes in the network are generated with the established LDA model and the network performance simulation environment is eventually established with the simulated delay data of all network nodes. Tests can be run in the simulation environment to offer evidences for optimization and QoS control of the real network.

Furthermore, in the flow shown in FIG. 2, given the parameters Z and β, w_(d) may also comply with Gaussian distribution. Therefore the p(w_(d)|Z_(d),β) in the preceding processes can be replaced with Gaussian distribution. The calculation of the parameters α and β may still adopt the corresponding method in FIG. 2 as long as certain changes of the parameters are made, i.e., the parameter β is changed into 2×K×K_(s). K_(s) is the number of Gaussian components; the Gaussian distribution has parameters μ and σ and the internal structure of corresponding LDA model is shown in FIG. 3 c. The detailed calculation will not be described any further herein.

FIG. 4 is a structure scheme of a system for establishing a network performance model according to an embodiment. As shown in FIG. 4, the system includes: network node 401 and performance model establishing device 402.

Network node 401 is adapted to provide performance data of the network node itself.

Performance model establishing device 402 is adapted to determine, according to the performance data provided by network node 401 and the probability of the performance data, the parameter α that shows the correlation of performance data of network nodes and the parameter β that shows the distribution pattern of the performance data in the network, and establish an LDA network performance model with the established parameters α and β.

The system may further include: performance data generating device 403, adapted to generate performance data by using the network performance model established at performance model establishing device 402.

Performance data generating device 403 can be a standalone device or be integrated into performance model establishing device 402.

A structure of performance model establishing device 402 is shown in FIG. 5, mainly including: parameter determining unit 510 and model establishing unit 520.

Parameter determining unit 510 is adapted to determine, according to performance data provided by network nodes and the probability of the performance data, a parameter α showing the correlation of the performance data of different network nodes and a parameter β showing the distribution pattern of the performance data in the network.

Model establishing unit 520 is adapted to establish a LDA network performance model by using the parameter α and the parameter β determined by parameter determining unit 510.

The device may further include: performance data generating unit 530, adapted to generate the performance data by using the network performance model established by model establishing unit 520.

Parameter determining unit 510 may further include: initiating unit 511, likelihood function setup unit 512 and parameter calculating unit 513.

Initiating unit 511 is adapted to initiate component number K of the network performance model.

Likelihood function setup unit 512 is adapted to establish a likelihood function l(α,β) of the parameters α and β according to the component number K from the initiating unit 511. The parameter α is a parameter of K dimension, the parameter β is a parameter of K×V dimension and V indicates the sample space of the performance data.

Parameter calculating unit 513 is adapted to calculate, according to the performance data from network nodes, the values of the parameters α and β that allow the likelihood function l(α,β) established at likelihood function setup unit 512 to reach the maximum value.

Parameter calculating unit 513 may further include: equivalent function establishing unit 5131, intermediate variable calculating unit 5132 and parameters α and β calculating unit 5133.

Equivalent function establishing unit 5131 is adapted to introduce intermediate parameters γ and φ into the likelihood function l(α,β) established at likelihood function setup unit 512 to obtain a simplified equivalent function L(γ,φ,α,β) of the likelihood function l(α,β). γ is a parameter of K dimension and φ is a parameter of K×V dimension.

Intermediate variable calculating unit 5132 is adapted to calculate, with the parameters α and β as the known variables and the parameters γ and φ as the independent variables, the optimized values of the parameters γ and φ by calculating the extremum of the simplified equivalent function L(γ,φ,α,β) established by equivalent function establishing unit 5131.

Parameters α and β calculating unit 5133 is adapted to use the optimized values of the parameters γ and φ from intermediate variable calculating unit 5132 in the simplified equivalent function L(γ,φ,α,β) and calculate the extremum of the simplified equivalent function L(γ,φ,α,β) with the parameters α and β as the independent variables, and eventually obtain the values of the parameters α and β.

Model establishing unit 520 may further include: parameter θ determining unit 521, adapted to determine, according to the parameters α and β determined by parameter determining unit 510, the Dir(α) distribution with which the internal variable θ of the LDA model complies; and

Parameter Z_(d) determining unit 522, adapted to determine, according to the parameters α and β determined by parameter determining unit 510, the Multinomial(θ) distribution with which the internal variable Z_(d) of the LDA model complies.

It can be seen from the preceding description that the method and apparatus provided by embodiments for establishing the network performance model can determine, according to the performance data provided by network nodes and the probability of the performance data, the parameter α that shows the correlation of the performance data of network nodes and the parameter β that shows the distribution pattern of the performance data in the network, and can further establish an LDA network performance model as the network performance model with the established parameters α and β. According to the received performance data, the method in embodiments not only can determine the parameter β that shows the distribution pattern of the performance data, but also the parameter α that shows the correlation of the performance data of the network nodes in the network. Therefore the distribution pattern of the performance data of all network nodes in the whole network can be obtained by using the parameters α and β. The distribution pattern of the performance data of all network nodes in the whole network is the basis of the network performance model and enables to the network performance model to reliably show the performance of the network nodes that provides the performance data as well as the performance of all other network nodes in the whole network, i.e., to be reliable in showing the performance of the whole network.

The variational inference in combination with the maximum likelihood approach is employed in embodiments to determine the values of the parameters α and β that correspond to the maximum probability of the performance data. Therefore the simulation environment established by the network performance model will be closer to the performance of the actual network while the calculation of the parameters α and β is simpler and requires less data to be processed.

The above are only exemplary embodiments and are not for use in limiting the protection scope thereof. All the modifications, equivalent replacements or improvements in the scope, spirit, and principles of the present disclosure shall be included in the protection scope of the present disclosure. 

What is claimed is:
 1. A method for establishing a network performance model used for showing the performance of a computer network node, the method comprising: receiving, by a performance model establishing device, performance data provided by the computer network node; rounding, by the performance model establishing device, the performance data or dividing the performance data into blocks; determining, by the performance model establishing device, according to at least one of the rounded and divided performance data, the performance data being provided by the computer network node, a parameter α showing a correlation of the performance data of different computer network nodes in a whole network and a parameter β showing a distribution pattern of the performance data in the network; and establishing, by the performance model establishing device, a Latent Dirichlet Allocation, LDA, network performance model by using the determined parameter α and the parameter β.
 2. The method of claim 1, wherein the determining the parameter α and the parameter β comprises: determining the parameters α and β that make the received performance data occur with the maximum probability.
 3. The method of claim 2, wherein the determining the parameters α and β that make the received performance data occur with the maximum probability comprises: determining the parameters α and β that make the received performance data occur with the maximum probability by using the maximum likelihood approach.
 4. The method of claim 3, wherein the determining the parameters α and β that make the received performance data occur with the maximum probability by using the maximum likelihood approach comprises: initializing a component number K of the LDA network performance model, in which K is greater than or equals to 2; establishing a likelihood function l(α,β) containing the parameters α and β ; and calculating, according to the received performance data, the parameters α and β that make the likelihood function l(α,β) reach the maximum value.
 5. The method of claim 4, wherein the likelihood function l(α,β) containing the parameters α and β is ${{l\left( {\alpha,\beta} \right)} = {\sum\limits_{d = 1}^{M}\;{\log\;{p\left( {{w_{d}❘\alpha},\beta} \right)}}}},$ in which M is the number of computer network nodes selected for sending performance data, w_(d) is the performance data sent from the d^(th) computer network node, and p(w_(d)|α,β) is the probability of w_(d) occurring under conditions of the parameters β and α; and the likelihood function l(α,β) contains internal latent variables θ and Z_(d), θ showing a distribution pattern of the performance data of the computer network node and complying with a Dirichlet distribution Dir(α), and θ being a parameter of K dimension; Z_(d) showing the distribution pattern of performance data of the d^(th) computer network node and complying with multinomial distribution Multinomial(θ); α a is a parameter of K dimension; when the probability of w_(d) is p(w_(d) |Z_(d),β) with the parameters β and Z_(d), β shall be a parameter characterized by K and V; wherein V indicates a numerical value of a sample space of performance data.
 6. The method of claim 5, wherein, when the probability of w_(d) is p(w_(d)|Z_(d),β) with the parameters Z_(d) and β the calculating of the parameters α and β that make the likelihood function l(α,β) reach the maximum value further comprises: calculating, by using a variational inference, the parameters α and β that make the likelihood function l(α,β) reach the maximum value.
 7. The method of claim 6, wherein the calculating, by using a variational inference, the parameters α and β that make the likelihood function l(α,β) reach the maximum value comprises: introducing intermediate variables γ and φ into the likelihood function l(α,β) to obtain a simplified equivalent function L(γ,φ,α,β) of the likelihood function l(α,β), in which γ is a parameter of K dimension and φ is a parameter of K×V dimension; calculating, with the parameters α and β as known variables and the parameters φ and φ as independent variables, optimized values of the parameters γ and φ by calculating the extremum of the simplified equivalent function L(γ,φ,α,β); and using the optimized values of the parameters γ and φ in the simplified equivalent function L(γ,φ,α,β), calculating the extremum of the simplified equivalent function L(γ,φ,α,β) with the parameters α and β as the independent variables, and obtaining the values of the parameters α and β as the values of the parameters α and β that make the likelihood function l(α,β) reach the maximum value.
 8. The method of claim 5, wherein the establishing the LDA network performance model by using the determined parameters α and β comprises: determining, according to the determined parameters α and β ,the Dir(α) with which the internal latent variable θ complies and the multinomial distribution Multinomial(θ) with which the internal latent variable Z_(d) complies.
 9. The method of claim 8, further comprising: generating simulated performance data by using the established LDA performance model.
 10. The method of claim 9, wherein the generating the simulated performance data by using the established LDA performance model comprises: selecting N as the amount number of performance data generated by the d^(th) computer network node, wherein N complies with Poisson distribution; making, by using the determined parameter α, the internal variable θ to comply with the Dir(α); making Z_(dn), which corresponds to simulated performance data w_(dn), to comply with the multinomial distribution Multinomial(θ); making w_(dn) to comply with p(w_(n)|Z_(dn),β) according to the determined parameter βwhen the probability of w_(dn) with the parameters Z_(dn) and βis p(w_(dn)|Z_(dn),β); or making w_(dn) to comply with Gaussian distribution when w_(dn) with the parameters Z_(dn) and β complies with Gaussian distribution; wherein w_(dn) is the n^(th) simulated performance data of the d^(th) computer network node and Z_(dn) is a distribution pattern of the n^(th) performance data of the d^(th) computer network node. 